<% component = metadata.sources.file %>

<%= component_header(component) %>

## Config File

<%= component_config_example(component) %>

## Options

<%= options_table(component.options.to_h.values.sort) %>

## Examples

Given the following input:

{% code-tabs %}
{% code-tabs-item title="/var/log/rails.log" %}
```
2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1
```
{% endcode-tabs-item %}
{% endcode-tabs %}

A [`log` event][docs.log_event] will be emitted with the following structure:

{% code-tabs %}
{% code-tabs-item title="log" %}
```javascript
{
  "timestamp": <timestamp> # current time,
  "message": "2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1",
  "file": "/var/log/rails.log", # original file
  "host": "10.2.22.122" # current nostname
}
```
{% endcode-tabs-item %}
{% endcode-tabs %}

The `"timestamp"`, `"file"`, and `"host"` keys were automatically added as
context. You can further parse the `"message"` key with a
[transform][docs.transforms], such as the
[`regex` transform][docs.regex_parser_transform].

## How It Works [[sort]]

<%= component_sections(component) %>

### Auto Discovery

Vector will continually look for new files matching any of your include
patterns. The frequency is controlled via the `glob_minimum_cooldown` option. 
If a new file is added that matches any of the supplied patterns, Vector will
begin tailing it. Vector maintains a unique list of files and will not tail a
file more than once, even if it matches multiple patterns. You can read more
about how we identify file in the [Identification](#file-identification)
section.

### Checkpointing

Vector checkpoints the current read position in the file after each successful
read. This ensures that Vector resumes where it left off if restarted,
preventing data from being read twice. The checkpoint positions are stored in
the data directory which is specified via the
[global `data_dir` option][docs.configuration.data-directory] but can be
overridden via the `data_dir` option in the `file` sink directly.

### File Deletions

If a file is deleted Vector will flush the current buffer and stop tailing
the file.

### File Identification

By default, Vector identifies files by creating a [cyclic redundancy check
(CRC)][url.crc] on the first 256 bytes of the file. This serves as a
fingerprint to uniquely identify the file. The amount of bytes read can be
controlled via the `fingerprint_bytes` and `ignored_header_bytes` options.

This strategy avoids the common pitfalls of using device and inode names since
inode names can be reused across files. This enables Vector to [properly tail
files in the event of rotation][docs.correctness].

### File Rotation

Vector will follow files across rotations in the manner of tail, and because of
the way Vector [identifies files](#file-identification), Vector will properly
recognize newly rotated files regardless if you are using `copytruncate` or
`create` directive. To ensure Vector handles rotated files properly we
recommend:

1. Ensure the `includes` paths include rotated files. For example, use
   `/var/log/nginx*.log` to recognize `/var/log/nginx.2.log`.
2. Use either the `copytruncate` or `create` directives when rotating files.
   If historical data is compressed, or altered in any way, Vector will not be
   able to properly identify the file.
3. Only delete files when they have exceeded the `ignore_older` age. While
   extremely rare, this ensures you do not delete data before Vector has a
   chance to ingest it.

### Globbing

[Globbing][url.globbing] is supported in all provided file paths, files will
be [autodiscovered](#auto-discovery) continually at a rate defined by the
`glob_minimum_cooldown` option.

### Line Delimiters

Each line is read until a new line delimiter (the `0xA` byte) or `EOF` is found.

### Read Position

By default, Vector will read new data only for newly discovered files, similar
to the `tail` command. You can read from the beginning of the file by setting
the `start_at_beginning` option to `true`.

Previously discovered files will be [checkpointed](#checkpointing), and the
read position will resume from the last checkpoint.

## Troubleshooting

<%= component_troubleshooting(component) %>

## Resources

<%= component_resources(component) %>